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Abstract 

Given a word w and a Parikh vector V, an abelian run of period V in w is a maximal occurrence of a 
substring of w having abelian period V. Our main result is an online algorithm that, given a word w of 
length n over an alphabet of cardinality cj and a Parikh vector V, returns all the abelian runs of period V 
in w in time 0(n) and space 0{a + p), where p is the norm of V, i.e., the sum of its components. We also 
present an online algorithm that computes all the abelian runs with periods of norm p in w in time Ofnp ), 
for any given norm p. Finally, we give an 0(n 2 )-time offline randomized algorithm for computing all the 
abelian runs of w. Its deterministic counterpart runs in 0(rr log a) time. 
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1. Introduction 

Computing maximal (non-extendable) repetitions in a word is a classical topic in the area of string 
algorithms (see for example [1] and references therein). Maximal repetitions of substrings, also called runs, 
give information on the repetitive regions of a word, and are used in many applications, for example in the 
analysis of genomic sequences. 

Kolpakov and Kucherov [ 2 ] gave the first linear-time algorithm for computing all the runs in a word and 
conjectured that any word of length n contains less than n runs. Recently, Bannai et al. Bi, using the 
notion of Lyndon roots of a run, proved this conjecture and designed a much simpler algorithm computing 
the runs. 

Here we deal with a generalization of this problem to the commutative setting. Recall that an abelian 
power is a concatenation of two or more words that have the same Parikh vector, i.e., that have the same 
number of occurrences of each letter of the alphabet. For example, aababa is an abelian square, since aab 
and aba both have two a’s and one b, i.e., the same Parikh vector V = (2,1). When an abelian power 
occurs within a word, one can search for its “maximal” occurrence by extending it to the left and to the 
right character by character without violating the condition on the number of occurrences of each letter. 
Following the approach of Constantinescu and Ilie [fj, we say that a Parikh vector V is an abelian period of 
a word w if w can be written as w = uqU\ ■ ■ ■ Uk-iUk for some k > 1 where for 0 < i < k all the zq’s have 
the same Parikh vector V and the Parikh vectors of uq and Uk are contained in V . If fc > 2, we say that the 
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word w is periodic with period V. Note that the factorization above is not necessarily unique. For example, 
a ■ bba ■ bba ■ e and e ■ abb ■ abb ■ a (e denotes the empty word) are two factorizations of the word abbabba both 
corresponding to the abelian period (1,2). Moreover, the same word can have different abelian periods. 

In this paper we define an abelian run of period V in a word w as an occurrence of a substring v of w 
such that v is periodic with abelian period V and this occurrence cannot be extended to the left nor to the 
right by one letter into a substring periodic with period V. 

For example, let w = ababaaa. Then the prefix ab ■ ab ■ a = w[0.. 4] has abelian period (1,1) but it is not 
an abelian run since the prefix a ■ ba ■ ba ■ a = rc[0.. 5] also has abelian period (1,1). The latter, on the other 
hand, is an abelian run of period (1,1) in w. 

Looking for abelian runs in a word can be useful to detect regions in the word where there is some kind 
of non-exact repetitiveness, for example regions with several consecutive occurrences of a substring or its 
reversal. 

Matsuda et al. [6] recently presented an offline algorithm for computing all abelian runs of a word of 
length n in 0[n 2 ) time. Notice that, however, the definition of abelian run in fb] is slightly different from 
the one we consider here. We compare both versions in Section [2j Basically, our notion of abelian run is 
more restrictive than the one of Q, for which we use the term “anchored run”. 

We first present an online algorithm that, given a word w of length n over an alphabet of cardinality 
a and a Parikh vector V 1 returns all the abelian runs of period V in w in time 0{n) and space 0(a + p), 
where p is the norm of V, that is, the sum of its components. This algorithm improves upon the one given 
in 0 which runs in time 0(np). Next, we give an 0{np )-time online algorithm for computing all the abelian 
runs with periods of norm p of a word of length n, for any given p. Finally, we present an 0(n 2 ) (resp. 
0(n 2 logn)) -time offline randomized (resp. deterministic) algorithm for computing all the abelian runs of 
a word of length n. 

The rest of this article is organized as follows. Sect. [2] introduces central concepts and fixes the notation. 
In Sect. [3] we review the results on abelian runs given in [b]. Sect. [4] is devoted to the presentation of our 
main result: a new solution for computing the abelian runs for a given Parikh vector. In Sect. [5] we apply 
this algorithm in a procedure for computing the abelian runs with periods of a given norm. Next, in Sect. [Gl 
we design a solution for computing all the abelian runs, which builds upon the result recalled in Sect. [3] 
Finally, we conclude in Sect. [3 

2. Definitions and Notation 

Let E = {ai, a 2 ,. ■ -, a CT } be a finite ordered alphabet of cardinality cr, and let E* be the set of finite words 
over E. We assume that the mapping between ai and i can be evaluated in constant time for 1 < * < cr. We 
let M denote the length of the word w. Given a word w = u;[0.. n — 1] of length n > 0, we write w[i] for the 
(* + l)-th symbol of w and, for 0 < i < j < n, we write w[i.. j] to denote a fragment of w from the (* + l)-th 
symbol to the (j + l)-th symbol, both included. This fragment is an occurrence of a substring w[i\ ■ ■ ■ w[j). 
For 0 < i < n, w[i.. i — 1] denotes the empty fragment. We let M» denote the number of occurrences of 
the symbol a G E in the word w. The Parikh vector of w, denoted by 'Pw, counts the occurrences of each 
letter of E in w, that is, Vw — (Man ■ • ■, Ma<r)- Notice that two words have the same Parikh vector if and 
only if one word is a permutation (i.e., an anagram) of the other. Given the Parikh vector Vw of a word 
w , we let Vw[i\ denote its *-th component and \Pw\ its norm, defined as the sum of its components. Thus, 
for w £ E* and 1 < * < cr, we have Vw[i\ = Ma> and \Pw\ = l Pw\f\ = M- Finally, given two Parikh 
vectors V, Q , we write PC Q if V[i] < Q[i\ for every 1 < * < cr. If additionally V ^ Q, we write V C Q 
and say that V is contained in Q. 

Definition 1 (Abelian period [H|). A factorization w = uqUi ■ ■ -Uk-iUk satisfying k > 1, Vu r = ■■■ = 
Vu k _i = V, and Vu 0 C V D Vu k is called a periodic factorization of w with respect to V. If a word w 
admits such a factorization, we say that V is an (abelian) period of w. 

We call fragments uq and Uk respectively the head and the tail of the factorization, while the remaining 
factors are called cores. Note that the head and the tail are of length strictly smaller than \V\; in particular 
they can be empty. 
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Figure 1: The tuple (i,h,t,j) denotes an occurrence of a substring starting at position i , ending at position j, and having 
abelian period V with head length h and tail length t. 


Observe that a periodic factorization with respect to a fixed period is not unique. However, it suffices 
to specify |uo| to indicate a particular factorization; see Dealing with factorizations of fragments of a 
fixed text, it is more convenient to use a different quantity for this aim. Suppose w[i .. j] = Uo ■ ■ -Uk is a 
factorization with respect to abelian period V with p = \P\. Observe that consecutive starting positions 
*i ,...,ik of factors ui,... ,Uk differ by exactly p. Hence, they share a common remainder modulo p 1 which 
we call the anchor of the factorization. Note that the anchor does not change if we trim a factorization of 
w[i.. j] to a factorization of a shorter fragment, or if we extend it to a factorization of a longer fragment. 

Definition 2 (Anchored period). A fragment w[i. .j] has an abelian period V anchored at k if it has a 
periodic factorization with respect to V whose anchor is k mod p. 

If w has a factorization with at least two cores, we say that w is periodic with period V (anchored at k 
if k mod p is the anchor of the factorization). 

Definition 3 (Abelian run). A fragment w[i. .j] is called an abelian run with period V if it is periodic 
with period V and maximal with respect to this property (i.e., each of w[i — 1.. j] and w[i.. j + 1] either 
does not exist or it is not periodic with period V). 

We shall often represent an abelian run w[i.. j ] as a tuple (*, h, t , j ) where h and t are respectively the lengths 
of the head and the of the tail of a periodic factorization of w[i. .j] with period V and at least two cores 
(see Figure [l]). Note that (i + h) modp is the anchor of the factorization, and that j } (j — * — h — t — 1) is 
the number of cores, in particular it is an integer. 

Observe that an abelian run with period V may have several valid factorizations. For example, a-ba-ba-e 
and s ■ ab • ab ■ a are factorization of a run tu[0..4] with period V = (1,1) in w = ababa. Therefore the run 
can be represented as (0,1, 0,4) and as (0, 0,1,4). However, in v = abab only (0, 0,0, 3) is a representation 
of u[0..3] as an abelian run with period V = (1,1). This is because (0,1,1,3) corresponds to a factorization 
u[0..3] = a-ba-b with one core only, and such a factorization does not indicate that u[0..3] is an abelian run. 

Matsuda et al. [fj gave a different definition of abelian runs, where maximality is with respect to extending 
a fixed factorization. In this paper, we call such fragments anchored (abelian) runs. 

Definition 4 (Anchored run @j). A fragment w[i. ■ j] is a k-anchored abelian run with period V if w[i.. j] 
is periodic with period V anchored at k and maximal with respect to this property (i.e., each of w[i — 1.. j] 
and w[i.. j + 1] either does not exist or it is not periodic with period V anchored at k). 

Note that every abelian run is an anchored run with the same period (for some anchor). The converse 
is not true, since it might be possible to extend an anchored run preserving the period but not the anchor. 
For example, in the word w = ababaaa considered in the introduction, the fragment ui[0.. 4] = e ■ ab - ab ■ a is 
a 0-anchored run but not an abelian run, since w[0. .5] = a ■ ba • ba ■ a is periodic with abelian period (1,1). 

Since a factorization is uniquely determined by the anchor, standard inclusion-maximality is equivalent 
to the condition in Definition [I] 

Observation 5. Let w[i.. j] and w[i'.. j'] be fragments of w with abelian period V anchored at k. If w[i.. j] 
is properly contained in w[i'.. j'} (i.e, i' < i and j < j', ori'<i and j < j'), then w[i.. j] is not a k-anchored 
abelian run with period V. 
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Abelian runs enjoy the same property, but its proof is no longer trivial. 

Lemma 6. Let w[i.. j] and w[i'.. j'] be fragments ofw with abelian period V. Ifw[i.. j] is properly contained 
in w[i'. .j'], then w[i. .j] is not an abelian run with period V. 

PROOF. We assume that i' < i. The case of j < j' is symmetric. For a proof by contradiction suppose that 
w[i..j ] is an abelian run and let w[i.. j] = uq ■ ■ ■ Uk be a periodic factorization with period V and at least 
two cores (i.e., satisfying k > 3). A periodic factorization of w[i'.. j'\ can be trimmed to a factorization 
w[i — 1. .j] = vq ■ ■ • V£. However, since w[i. . j] is an abelian run, this factorization must have at most one 
core (i.e., t < 2). Moreover, uq ■ ■ • Uk cannot be extended to a factorization of w[i — 1.. j] = u' 0 u\ ■ ■ - Uk- In 
other words u' 0 , the extension of uq by one letter to the left, must satisfy V u > % "P. 

Let p = \P\. The conditions on the number of cores imply |zio|+2p < |u;[i.. j] | and \w[i — 1 ].. j)\ < \v$\+2p. 
Consequently, |uq| = |uo| + 1 < |^o|, i.e., u' 0 is a proper prefix of vo- This yields V u > C P v 0 C V, which is in 
contradiction with V u / %V. □ 

Corollary 7. Let w be a word. For a fixed Parikh vector V, there is at most one abelian run with abelian 
period V starting at each position ofw. 

3. Previous Work 

Matsuda et al. |bf presented an algorithm that computes all the anchored runs of a word w of length n in 
0(n 2 ) time and space complexity. The initial step of the algorithm is to compute maximal abelian powers 
in w. Recall that an abelian power is a concatenation of several abelian-equivalent words. In other words, 
an abelian power of period V is a word admitting a periodic factorization with respect to V with an empty 
head, an empty tail and at least two cores. A fragment w[i. .j] is a maximal abelian power if it cannot be 
extended to a longer power of period V (preserving the anchor). Formally, the maximality conditions are 

1* 'Pw[i—p..i— i] ^ 'Pw[i..i+p— i] or i p < 0, and 
2- Pw[j-p+i..j] 7^ T :, w\j+i..j+p] or j +P > n, 
where p = [P\. 

The approach of Q is to first compute all the abelian squares using the algorithm by Cummings & 
Smyth [§}. The next step is to group squares into maximal abelian powers. For this, it suffices to merge 
pairs of overlapping abelian squares of the form w[i. .i + 2p — I] and w[i + p.. i + 3p — 1]. This way maximal 
abelian powers are computed in 0(n 2 ) time. 

Observe that there is a natural one-to-one correspondence between maximal abelian powers and anchored 
runs: it suffices to trim the head and the tail of the factorization of an anchored run to obtain a maximal 
abelian power. Hence, the last step of the algorithm is to compute the maximal head and tail by which each 
abelian power can be extended. This could be done naively in 0(n 3 ) time overall, but a clever computation 
enables to find all the abelian runs in time and space 0(n 2 ) (see [§] for further details). 

In Section [Gj we extend this result to compute the abelian runs only rather than all the anchored runs. 
Both these algorithms work offline: they need to know the whole word before reporting any abelian run. 
In the following two sections we give several online algorithms, which are able to report a run ending at 
position * — 1 of a word w before reading w[i + 1] and the following letters. Clearly, not knowing w\i\ one 
cannot decide whether the run could be extended to the right, so this is the optimal delay. However, these 
methods are restricted to finding runs of a given period or a given norm of the periods, respectively. 

4. Computing Abelian Runs with Fixed Parikh Vector 

In this section we present our online solution for computing all the abelian runs of a given Parikh vector 
V of norm p in a given word w. The algorithm works in 0(n ) time and O[o + p) space where n = |wj|. 

First, in Sect. 14.11 we show how to compute all anchored runs of period V. Later, in Sect. 14.21 we modify 
the algorithm to return abelian runs only. We conclude in Sect. 14.31 with an example course of actions in 
our solution. 
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Figure 2: Bi [fc] = oo for i — p + 1 < k < bi. 


4-1. Algorithm for Anchored Runs 

We begin with a description of data maintained while scanning the string w. For an integer k, let Bi[k] 
be the starting position of the longest suffix of w[0.. i] which has period V anchored at k. If there is no such 
a suffix, we set Bi[k\ = oo. Since this notion depends on k mod p only, we store Bi[k] for 0 < k < p only. 

Let bi be the starting position of the longest suffix of w[0.. i\ whose Parikh vector is contained in or equal 
to V. In other words, we have C V and 'P w \b i -i..i] <2 P ( or bi = 0). Note that bi = i + 1 if w[i] does 

not occur in V. 

Observe that the tail of any periodic factorization of a suffix of u>[0.. i\ must be contained in w[bi. .i\. 
This leads to the following characterization: 

Lemma 8. Let 0 <i< |ui|. We have Bi[k\ < k for bi < k < i + 1 and Bi[k] = oo for i— p + 1 < k < bi. 

PROOF. For bi < k < i + 1, the fragment w[k.. i] has abelian period V anchored at k. (The underlying 
factorization has empty head, no cores and tail w[k .. *], unless k = bi=i — p + 1, when the factorization 
has one core, empty head and empty tail). Hence, we have Bj[k] < k directly from the definition. 

For i — p + 1 < k < bi, the tail of the factorization with anchor k mod p would need to start at position 
k, which is impossible (see Figure O- □ 

The values bi -1 and bi are actually sufficient to describe Bi based on Bi- i. 

Lemma 9. For 0 < i < |w| the following equalities hold: 

1. Bi[k\ = c» / Bi-i[k] for max(i — p + \,bi-\) < k < bi, 

2. Bi [k] = Bi -1 [fc] for bi < k < i and for i — p + 1 < k < bi-±, 

3. Bi[i + 1] = bi if bi > i — p + 1 and Bi[i + 1] = Bi-i[i — p + 1] otherwise. 

PROOF. Lemma [8] implies that Bi[k\ = oo for i — p + 1 < k < bi and Bi-i[k] = oofor* — p + 1 < k < bi -1 
(hence Bi-i[k] = Bi[k\ in this latter case). For bi < k < i, we have V w \k,.{\ C V, so we can extend the 
factorization of a suffix of tc[0.. * — 1] whose tail starts at position k (see Figure [3]). 

Finally, note that Bi[i + 1] is the starting position of the maximal suffix of w[0.. i] with an empty-tail 
periodic factorization. If V w u- p +i..i\ ^ V (i.e., if bi > i — p + 1), this is just w[bi.. i]. Otherwise, we can 
extend the factorization of a suffix of ic[0.. i — 1] whose tail starts at position i — p + 1. □ 

Having read letter «;[?'], we need to report anchored runs which end at position i — 1. For this, we use 
the following characterization. 

Lemma 10. Let i — p < k < i. A fragment w[b.. i — 1] is a k-anchored run with period V if and only if 
Bi-i[k\ = b < k — 2p and Bi[k] > b. 

PROOF. Clearly an anchored run ending at position i — 1 must be a left-maximal suffix of w[0.. i — 1] with 
a given anchor. Moreover, we must have b < k — 2p so that the factorization has at least two cores and 
Bi[k] > b due to right-maximality. It is easy to see that these conditions are sufficient. □ 
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By Lemma [21 most entries of Bi are inherited from Bi- i, so we use a single array B and having read 
w[f\, we update its entries. As evident from Lemma 1 101 each anchored run to be reported corresponds to a 
modified entry. 

The algorithm AnchoredRun(7 :> ,p, w, n) in Figure [4] implements our approach. The while loop incre¬ 
ments k from bi-i to bi. For k > i — p, we set B[k] to oo and possibly report a run. Note that k = i — p + 1 
is within the scope of Case [3] rather than Case [T| in Lemma El However, later we set B[i + 1] to bi if 
bi > i — p + 1 (as described in Case H. Nevertheless, if an (i + l)-anchored run needs to be reported, we 
have Bi_\[i — p + 1] < oo = Bi[i — p + 1], so bi-i < i — p+l and thus k = i — p + 1 is considered in the loop. 

Theorem 11. The algorithm Anchored Run^,??, it, n) computes all the anchored runs with period V of 
norm p in a word w of length n in time 0(ri) and additional space 0(a + p). 

Proof. The correctness of the algorithm comes from Lemmas ElfTUl and the discussion above. The external 
for loop in lines [51 fTTl runs n- 1-1 times. The internal while loop in lines HI IT?1 cannot iterate more than n- 1-1 
times since it starts with k equal to 0 and ends when k is equal to n and k can only be incremented by 1 
(in line m. The test Pw[k..i\ 2 P i n line [I] can be realized in constant time once we store Pw[k. i] and a 
counter of its components for which the value is greater than in V. This data needs to be updated once we 
increment i in the for loop and k in line 1121 We then need to increment the component xc[i] or decrement 
the component w[k] of Pyj[k..i\, respectively. The global counter needs to be updated accordingly. All the 


Bj-i[k\ 


bi -i bi k 


CP 


fragment 
ending at i — 1 

fragment 
ending at i 


Figure 3: Bi[k] = B t _i [C] for bi < k < i. 


AnchoredRun^p, W, n) 

1 k <- 0 

2 B[ 0] <- k 

3 for i ■<— 0 to n do 

\ while k < n and (i = n or V W [k..{\ P) d° 

5 if k > i — p then 

6 b <r- B[k mod p] 

7 B[k mod p] «— oo 

8 if b < k — 2p then 

9 h ■(— (k — b) mod p 

10 t <r- i — k 

11 Output(6, h,t,i — 1) 

12 k t- k + 1 

13 if k > i — p + 1 then 

14 [ B[(i + 1) mod p] 4 — k 


Figure 4: Algorithm computing all the anchored runs of period V of norm p in a word w of length n. 
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other operations run in constant time. Thus the total time complexity of the algorithm is 0(n). The space 
complexity comes from the number of counters (a) and the size of the array B ( p ). □ 

Note that the space consumption can be reduced to 0{p) at the price of introducing (Monte Carlo) 
randomization. Instead of storing the Parikh vectors in a plain form, we can use dynamic hash tables Q so 
that the size is proportional to the number of non-zero entries. 

4-2. Algorithm for Abelian Runs 

In this section we extend our algorithm so that it reports abelian runs only. For an offline solution, we 
could simply determine the anchored runs (using the procedure developed above) and filter out those which 
are not maximal. However, in order to obtain an online algorithm, we need a more subtle approach, which 
is based on the following characterization. 

Lemma 12. A fragment w[b.. i — 1] is an abelian run with period V if and only if it is an anchored run 
(with period V) and for each k! the inequalities Bi-\[k'] > b and Bi[k'] > b hold. 

Proof. By Lemma [Gj an abelian run of period V cannot be properly contained in a fragment with pe¬ 
riod V (anchored at some k'). Conditions involving and B,; [k 1 ] enforce left-maximality and right- 

maximality, respectively. Since each abelian run is an anchored run (with the same period) and since all 
anchored runs are periodic, the claim follows. □ 

To apply Lemma 1121 it suffices to find an anchor k such that b = = min*/ Bi^\[k'] < min/-' Bi[k']. 

There can be several such anchors and in case of ties we are going to detect the one for which the factorization 
of w[b. .i — 1] has shortest tail. This factorization maximizes the number of cores, so if w[b. .i — 1] is an 
anchored run with any anchor, it is with that one in particular. Note that the while loop in lines HI IT?1 
of Algorithm AnchoredRun processes anchors < k < bi in the order of decreasing tail lengths and 


RUN(T > ,p, w, n ) 


1 

2 

3 

4 

5 

6 

7 

8 
9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 


k 4- 0 
L <- 0 
B[ 0] k 

Ptr[ 0] 4- INSERTAtTheEnd(L,0) 
for i 4— 0 to n do 

6 m in 4- R[GETFIRST(L)] 
while k <n and (i = n or Vw[k..i] 2 'P) do 
if k > i — p then 
b 4— B[k mod p] 

B[k mod p] 4— oo 
Delete (L, Ptr[k mod p]) 

Ptr[k mod p] 4— Nil 

if b = 6 m in and R[getFirst(L)] > b and b < k — 2p then 
h 4- (k — b) mod p 
t i — k 
Output (b,h,t,i) 
k 4- k + 1 

if k > i — p + 1 then 
B[(i + 1) mod p] 4— k 

Ptr[(i + 1) mod p] 4- insertAtTheEnd(L, (i + 1) mod p) 


Figure 5: Algorithm computing all the abelian runs of period V of norm p in a word w of length n. 
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updates the underlying values B[k\ from Bi-i[k\ to Bi[k\. For the sought anchor k this update strictly 
increases the value min/-/ B[k'\, and moreover this is the first increase of the minimum within a given 
iteration of the outer for loop. Hence, we record the original minimum and check for an abelian run only if 
min*/ B[k'] increases from that value. 

To implement the procedure described above, we need to efficiently compute the smallest element in the 
array B. For this, recall that B[j] can only be modified from oo to k (and the value k does not decrease 
throughout the algorithm) or from some value back to oo. We maintain a doubly-linked list L of all indices 
j with finite B[j] such that the order of indices j in the list is consistent with the order of values B\j\. To 
update the list, it suffices to insert the index j to the end of list while setting B[j] to k , and remove it 
from the list setting B[j] to oo. Then the smallest value in B is attained at an argument stored as the first 
element of the list L (or oo, if the list is empty). 

The algorithm K\JN(P,p,w,n), depicted in Figure 0 implements the approach described above. It uses 
the following constant-time functions to operate on lists: 

• insertAtTheEnd(L, e) that inserts e at the end of the doubly-linked list L and returns a pointer to 
the location of e in the list; 

• Delete (L,ptr) that deletes the element pointed by ptr from the doubly-linked list L; 

• GEtFirst(L) that returns the first element of the list L (0 if the list is empty). 

The algorithm also uses an array Ptr which maps any anchor j to a pointer of the corresponding location 
in the list L (or Nil if B[j] = oo). 

The discussion above proves that Run("P,P, w, n) correctly computes abelian runs with period V in w. 
Its running time is the same as that of AnchoredRun(7 :, ,p, w, n) since the structure of the computations 
remains the same while additional instructions run in constant time. Memory consumption is still 0(p + a) 
because both L and Ptr take O(p) space. 

Theorem 13. The algorithm Run(7 Ap, ir,n) computes all the abelian runs with period V of norm p in 
a word w of length n in time 0(n ) and additional space 0(a +p), which can be reduced to 0(p) using 
randomization. 


4-3. Example 

Let us see the behaviour of the algorithm on E = {a, b}, w = abaababaabbb and V = (2,2): 


o 

II 

B = [0, oo, oo 

, oo] L = (0) 


i = 0 

V w[0 ]CP B 

= [0,0, oo, oo] 

£ = (0,1) 

i = 1 

Pw[0..1] C V 

B = [0,0,0, oo] 

£ = (0,1,2) 

i = 2 

Pw[0..2] C V 

B = [0,0,0,0] 

£=(0,1,2,3) 

i = 3 

Pw[0..3\ 2 P 

6 = 0 B = [oo 

,0,0,0] £ = (1,2,3) 


Pw[1..3\ 1= P 

B = [1,0,0,0] 

£=(1,2,3,0) 

* = 4 

Pw[l.A] C V 



i = 5 

Pw[l..b\ 2 P 

k = 2 



Pw[ 2..5] 2 P 

6 = 0 B = [1,0, oo, 0] L = (1,3,0) 


Pw[3..5] Q P 

B = [1,0,3,0] 

£=(1,3,0,2) 

i = 6 

Pw[ 3..6] ^ P 



i = 7 

Pw[3..7] 2 P 

k = 4 



Pw[4..7] P P 



i = 8 

P u)[4..8] 2 P 

k = 5 



Pw\ 5..8] 2 P 

6 = 0 B = [1, 

oo, 3,0] L = (3,0,2) 


P W [ 6-8] C V 

B = [1,6,3,0] 

£=(3,0,2,!) 

i = 9 

Pw[6.. 9] P 



i = 10 

^[e.-io] % P 

k = 7 



8 


k = 1 


k = 3 


k = 6 




i = 11 


Pw[ 7..10] £ P 
Pw[ 7..li] % P k = 8 

P W [ 8 .. 11 ]%P 6=1 B = [oo,6,3,0] L =(3,2,1) 6 = 9 
P w \g in £ P b = 6 B = foo, oo, 3,01 B = (3, 2) k = 10 
P.U.ii! £ P B = [10,00,3,0] i = (3,2,0) 

i = 12 

6 = 3 B = [10, oo, oo, 0] L = (3,0) k = 11 
6 = 0 B = [10, oo, oo, oo] B = (0) 

6 = 3 6 = 1 Output( 0, 3,1,11) 6 = 12 
6 = 10 B = [oo, oo, oo, oo] L = () k = 13 


5. Computing Abelian Runs with Fixed Parikh Vector Norm 

In this section we develop an 0(np)- time algorithm to compute all abelian runs with periods of norm p. 
First, we describe the algorithm for anchored runs and later generalize it to abelian runs. 

5.1. Anchored Runs 

Let us start with a simple offline algorithm which works in O(n) time to compute k- anchored runs with 
period of norm p for fixed values p and k. This method is similar to the algorithm of Matsuda et al. [6] 
briefly described in Section [3] Namely, it suffices to compute maximal abelian powers with periods of norm 
p anchored at 6, and then extend them by a head and a tail. 

Define a block as any fragment of the form w[i. .i + p — 1] such that i = k (mod p). Note that the 
cores in decompositions with anchor k mod p are blocks. Finding 6-anchored powers with periods of a given 
norm p is very easy if the anchor is fixed. We consider consecutive blocks, naively check if they are abelian- 
equivalent and merge any maximal chains of abelian-equivalent blocks. Determining the head and the tail 
of the 6-anchored runs is also simple. For each i = k (mod p) we compute the longest suffix of u;[0. .i — 1] 
and the longest prefix of w[i + p. .n — 1] whose Parikh vectors are contained in P w \i..i+p-i}- 

This approach can be implemented online in O(o + p) space as follows: we scan consecutive blocks and 
(naively) check their abelian equivalence. Whenever we read a full block (say, starting at position i). we 
compute the longest suffix w[bi-\. .i — 1] of u>[0. .6 — 1] whose Parikh vector is contained in P w u i+p _ p. 
This gives a periodic factorization of w[bi-i .. i + p — 1] anchored at k mod p. We then try to extend it 
to the right while reading further characters. Once it is impossible to extend the factorization, say by 
letter w[j + 1], we declare w[bi-i.. j] as a maximal fragment with period P w H..i+p-i] anchored at k. If the 
decomposition has at least two cores, we report an anchored run. If we succeed to extend by a full block 
(i.e., if P w \i..i+p- 1 ] = Pw[i+p..i+ 2 p-i]), we do not restart the algorithm but instead we continue to extend 
the factorization. This way, we guarantee that 6(_i > i—p whenever we start building a new factorization. 

Clearly, the procedure described above computes all 6-anchored runs with period of norm p. To compute 
all anchored runs, we simply run it in parallel for all p possible anchors. 

Theorem 14. There is an algorithm which computes online all the anchored runs with periods of norm p 
in a word w of length n over an alphabet of size cr in time 0(np) and additional space O(p(o +p)), which 
can be reduced to 0(p 2 ) using randomization. 

5.2. Abelian Runs 

Let us first slightly modify the algorithm presented in the previous section. Observe that whenever we 
start a new phase having just read a block w[i.. i + p — 1], instead of performing the computations using a 
simple procedure described above, we could launch the algorithm of Section l4~ll for «j[max('i — p, 0).. ] and 
V = P w u i+v _ i], simulate it until it needs to read w[i +p] and then feed it with newly read letters until the 
maximal extension of w[i. .i + p— 1] anchored at k is found (i.e., until the respective entry of the B array 
is set to oo). Other anchored runs output by the algorithm should be ignored, of course. As before, if such 
a process is running while we have completed reading a subsequent block, we do not start a new phase. 
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It is easy to see that such an algorithm is equivalent to the previous one. However, if we use the 
algorithm of Section [4.21 instead, we automatically get a possibility to check whether the maximal extension 
of w[i.. i + p — 1] anchored at A: is a maximal fragment with period V w [i,,i+ P - 1 ]- Note that we start the 
simulation at a position max(i — p, 0) which is smaller than 6j_i, unless the latter is 0. This guarantees that 
left-maximality is correctly verified despite the fact the fragment prior to position i — p is ignored in the 
simulation. As before, we disregard any other abelian run that the algorithm of Section [~Q1 may return. We 
run this process in parallel for all possible anchors to guarantee that each abelian run with period of norm p 
is reported exactly once. More precisely, in ambiguous cases a run is reported for the anchor corresponding 
to the factorization with shortest tail, just as in Section l4~2l 

Theorem 15. There is an algorithm which computes online all the abelian runs with periods of norm p in 
a word w of length n over an alphabet of size a in time 0(np ) and additional space 0(p(a +p)), which can 
be reduced to 0(p 2 ) using randomization. 

6. Offline Algorithm for Computing All Abelian Runs 

In this section we present an 0(n 2 )-time offline algorithm which computes all the abelian runs. As 
a starting point, we use the set of all anchored runs computed by the algorithm by Matsuda et al. (see 
Section [3]). Recall that all abelian runs are anchored runs with the same period. Hence, it suffices to filter 
out those anchored runs which are properly contained in another anchored run with the same period. We 
also need to make sure that every abelian run is reported once only (despite possibly being A-anchored for 
different anchors k). 

Note that this filtering can be performed independently for distinct periods. If we have a list of anchored 
runs with a fixed period, sorted by the starting position, it is easy to retrieve the abelian runs of that period 
with a single scan of the list. Ordering by the starting position can be performed together for all periods 
so that it takes 0(r + n ) time where r is the number of all anchored runs. Hence, the main difficulty is 
grouping according to the period. For this, we shall assign to each fragment of w an identifier , so that 
two fragments are abelian-equivalent if and only if their identifiers are equal. The identifiers of periods can 
be easily retrieved since given a fc-anchored run, we can easily locate one of the cores of the underlying 
factorization. 

Thus, in the remaining part of this section we design a naming algorithm which assigns the identifiers. 
A naive solution would be to generate the Parikh vectors of all substrings of w , sort these vectors removing 
duplicates, and give each fragment a rank of its Parikh vector in that order. However, already storing the 
Parikh vectors can take prohibitive 0(n 2 a) space. 

To overcome this issue, we use the concept of diff-representation , originally introduced in the context of 
abelian periods [10j. Observe that in a sense the Parikh vectors of fragments can be generated efficiently: for 
a fixed p, we can first generate V w [o..p-i], then update it to V w n.. p ], and so on until we reach V w \ n - p .. n -i}. 
In other words, the Parikh vectors of all fragments of length p can be represented in a sequence so that the 
total Hamming distance of the adjacent vectors is 0(n). The diff-representation, designed to manipulate 
sequences satisfying such a property, is formally defined as a sequence of single-entry changes such that the 
original sequence of vectors is a subsequence of intermediate results when applying this operations starting 
from the null vector (of the fixed dimension r). Note that the diff-representation of a sequence of Parikh 
vectors of all fragments of w can be computed in time 0(n 2 ) proportional to its size. The following result 
lets us efficiently assign identifiers to its elements. 

Lemma 16 i fiolj h Given a sequence of vectors of dimension r represented using a diff-representation of size 
m, consider the problem of assigning integer identifiers of size n °W so that equality of vectors is equivalent 
to equality of their identifiers. It can be solved in 0(r + mlogr ) time using a deterministic algorithm and 
in 0(r + to) time using a Monte Carlo algorithm which is correct with high probability (1 — ^ r+ 1 m )c where c 
can be chosen arbitrarily large). 

In our setting this yields the following result. 
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Theorem 17. There exists an 0(n 2 )-time randomized algorithm (Monte Carlo, correct with high-probability) 
which computes all abelian runs in a given word of length n. Additionally, there exists an 0(n 2 log a)-time 
deterministic algorithm solving the same problem. 

7. Conclusions 

We gave algorithms that, given a word w of length n over an alphabet of cardinality a, return all the 
abelian runs of a given period V in w in time 0(n ) and space 0(cr -\-p), or all the abelian runs with periods 
of a given norm p in time 0{np) and space 0(p(a + p)). These algorithms work in an online manner. 
We also presented an 0(n 2 ) (resp. 0(n 2 log n))-time offline randomized (resp. deterministic) algorithm for 
computing all the abelian runs in a word of length n. One may wonder if it is possible to reduce further 
the complexities of these latter algorithms. We believe that further combinatorial results on the structure 
of the abelian runs in a word could lead to novel solutions. 
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